Hands-on Exercise 10a: Processing and Visualising Flow Data

Author

Ho Zi Jun

Published

October 31, 2024

Modified

October 31, 2024

1 Overview

Spatial interaction describes the movement of people, materials, or information between geographical locations. It includes a range of flows—from freight and energy distribution to global trade, flight schedules, and pedestrian traffic. Each interaction can be represented as an origin-destination pair in a matrix, where rows and columns represent the centroids of the origin and destination locations, respectively. This type of matrix is typically referred to as an origin-destination or spatial interaction matrix.

In this hands-on exercise, we will learn how to build an OD matrix by using Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall. By the end of this hands-on exercise, we will be able:

  • to import and extract OD data for a selected time interval,
  • to import and save geospatial data (i.e. bus stops and mpsz) into sf tibble data frame objects,
  • to populate planning subzone code into bus stops sf tibble data frame,
  • to construct desire lines geospatial data from the OD data, and
  • to visualise passenger volume by origin and destination bus stops by using the desired lines data.

2 Getting Started

For the purpose of this exercise, five r packages will be used. They are:

  • sf for importing, integrating, processing and transforming geospatial data.
  • tidyverse for importing, integrating, wrangling and visualising data.
  • tmap for creating elegent and cartographic quality thematic maps.
  • stplanr provides functions for solving common problems in transport planning and modelling such as downloading and cleaning transport datasets; creating geographic “desire lines” from origin-destination (OD) data; route assignment, locally and interfaces to routing services such as CycleStreets.net; calculation of route segment attributes such as bearing and aggregate flow; and ‘travel watershed’ analysis.
  • DT provides an R interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides filtering, pagination, sorting, and many other features in the tables.
pacman::p_load(tmap, sf, DT, stplanr, tidyverse)

3 Preparing the Flow Data

3.1 Importing the OD data

Firstly, we will import the Passenger Volume by Origin Destination Bus Stops data set from LTA DataMall by using read_csv() of readr package.

odbus <- read_csv("data/aspatial/origin_destination_bus_202210.csv")
Rows: 5122925 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): YEAR_MONTH, DAY_TYPE, PT_TYPE
dbl (4): TIME_PER_HOUR, ORIGIN_PT_CODE, DESTINATION_PT_CODE, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

odbus tibble data table is displayed by using the code chunk below.

glimpse(odbus)
Rows: 5,122,925
Columns: 7
$ YEAR_MONTH          <chr> "2022-10", "2022-10", "2022-10", "2022-10", "2022-…
$ DAY_TYPE            <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 10, 10, 7, 11, 16, 16, 20, 7, 7, 11, 11, 8, 11, 11…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <dbl> 65239, 65239, 23519, 52509, 54349, 54349, 43371, 8…
$ DESTINATION_PT_CODE <dbl> 65159, 65159, 23311, 42041, 53241, 53241, 14139, 9…
$ TOTAL_TRIPS         <dbl> 2, 1, 2, 1, 1, 4, 1, 3, 1, 5, 2, 5, 15, 40, 1, 1, …

A quick inspection of the odbus tibble shows that the values in ORIGIN_PT_CODE and DESTINATION_PT_CODE are stored as numeric data. The following code converts these values to character data type for consistency.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

3.2 Extracting the study data

For this hands on exercise, we will extract commuting flows on weekdays for timeslots between 6 and 9 o’clock (AM).

odbus6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.

Table below shows the content of odbus6_9

datatable(odbus6_9)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

We will save the output in rds format for easy retrieval and future usage when needed without the need to repeat the steps above.

write_rds(odbus6_9, "data/rds/odbus6_9.rds")

The code chunk below will be used to import the saved odbus6_9.rds from R environment.

odbus6_9 <- read_rds("data/rds/odbus6_9.rds")

4 Working with Geospatial Data

For the purpose of this exercise, two geospatial datasets will be used. They are:

  • BusStop: This data provides the location of bus stop as of last quarter of 2022.
  • MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019.

Both data sets are in ESRI shapefile format.

4.1 Importing geospatial data

The code chunks below are used to import the datasets and update them with correct ESPG code (i.e. 3414):

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\zjho008\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex10\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
busstop
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 10 features:
   BUS_STOP_N BUS_ROOF_N             LOC_DESC                  geometry
1       22069        B06   OPP CEVA LOGISTICS POINT (13576.31 32883.65)
2       32071        B23         AFT TRACK 13 POINT (13228.59 44206.38)
3       44331        B01              BLK 239  POINT (21045.1 40242.08)
4       96081        B05 GRACE INDEPENDENT CH POINT (41603.76 35413.11)
5       11561        B05              BLK 166 POINT (24568.74 30391.85)
6       66191        B03         AFT CORFE PL POINT (30951.58 38079.61)
7       23389       B02A              PEC LTD   POINT (12476.9 32211.6)
8       54411        B02              BLK 527 POINT (30329.45 39373.92)
9       28531        B09              BLK 536 POINT (14993.31 36905.61)
10      96139        B01              BLK 148  POINT (41642.81 36513.9)
mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `C:\zjho008\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex10\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...
Note
  • st_read() function of sf package is used to import the shapefile into R as a sf data frame.
  • st_transform() function of sf package is used to transform the projection to crs 3414.

Similar to the section ### Extracting the study data the code chunk below will be used to write mpsz sf tibble data frame into an rds file for future usage.

mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")

5 Geospatial data wrangling

5.1 Combining Busstop and mpsz

The code chunk below transfers the planning subzone code (SUBZONE_C) from the mpsz sf data frame to the busstop sf data frame.

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Note
  • st_intersection() is used to perform point and polygon overlay and the output will be in point sf object.
  • select() of dplyr package is then used to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz sf data frame.
  • five bus stops are excluded in the resultant data frame because they are outside of the Singapore boundary.
datatable(busstop_mpsz)

Before moving to the next step, we will save the output into rds format.

write_rds(busstop_mpsz, "data/rds/busstop_mpsz.rds")  

Next step is to append the planning subzone code from busstop_mpsz data frame onto odbus6_9 data frame.

od_data <- left_join(odbus6_9 , busstop_mpsz,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = SUBZONE_C,
         DESTIN_BS = DESTINATION_PT_CODE)
Warning in left_join(odbus6_9, busstop_mpsz, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 55491 of `x` matches multiple rows in `y`.
ℹ Row 161 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Before continuing a good practice is to check for duplicated records:

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

If there are duplicate records, the code chunk below will be used to retain only the unique records.

od_data <- unique(od_data)

It is also a good practice to confirm if the duplicated records issue has been addressed fully.

anyDuplicated(od_data)
[1] 0

Next, we will update od_data data frame with the planning subzone codes.

od_data <- left_join(od_data , busstop_mpsz,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 
Warning in left_join(od_data, busstop_mpsz, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 74 of `x` matches multiple rows in `y`.
ℹ Row 1379 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
od_data <- unique(od_data)
od_data <- od_data %>%
  rename(DESTIN_SZ = SUBZONE_C) %>%
  drop_na() %>%
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.

Code chunk below to save the output into an rds file format.

write_rds(od_data, "data/rds/od_data_fii.rds")
od_data_fii <- read_rds("data/rds/od_data.rds")

6 Visualising Spatial Interaction

In this section, we will learn how to prepare a desire line by using stplanr package.

6.1 Removing intra-zonal flows

We will not plot the intra-zonal flows. The code chunk below will be used to remove intra-zonal flows. Afterwards the output will be saved as rds file format.

od_data_fij <- od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]
write_rds(od_data_fij, "data/rds/od_data_fij.rds")
od_data_fij <- read_rds("data/rds/od_data_fij.rds")

6.2 Creating desire lines

In this code chunk below, od2line() of stplanr package is used to create the desire lines.

flowLine <- od2line(flow = od_data_fij, 
                    zones = mpsz,
                    zone_code = "SUBZONE_C")
Creating centroids representing desire line start and end points.
write_rds(flowLine, "data/rds/flowLine.rds")
flowLine <- read_rds("data/rds/flowLine.rds")

6.3 Visualising the desire lines

To visualise the result of the desire lines, the code chunk below is used:

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.66. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

Warning

The rendering process takes more time because of the transparency argument (i.e. alpha)

When flow data is particularly messy and highly skewed, as seen above, it’s often more effective to focus on selected flows, such as those greater than or equal to 5000, as shown below.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.5)
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.66. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.